• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ÀüÀÌ ÇнÀ°ú ¾îÅÙ¼Ç(Attention)À» Àû¿ëÇÑ ÇÕ¼º°ö ½Å°æ¸Á ±â¹ÝÀÇ À½¼º °¨Á¤ ÀÎ½Ä ¸ðµ¨
¿µ¹®Á¦¸ñ(English Title) CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism
ÀúÀÚ(Author) ÀÌÁ¤Çö   À±Àdz砠 Á¶±Ù½Ä   Jung Hyun Lee   Ui Nyoung Yoon   Geun-Sik Jo  
¿ø¹®¼ö·Ïó(Citation) VOL 47 NO. 07 PP. 0665 ~ 0673 (2020. 07)
Çѱ۳»¿ë
(Korean Abstract)
±âÁ¸ÀÇ À½¼º ±â¹Ý °¨Á¤ ÀÎ½Ä ¿¬±¸´Â ´ÜÀÏÇÑ À½¼º Ư¡°ªÀ» »ç¿ëÇÑ °æ¿ì¿Í ¿©·¯ °¡Áö À½¼º Ư¡°ªÀ» »ç¿ëÇÑ °æ¿ì·Î ºÐ·ùÇÒ ¼ö ÀÖ´Ù. ´ÜÀÏÇÑ À½¼º Ư¡°ªÀ» »ç¿ëÇÑ °æ¿ì´Â À½¼ºÀÇ °­µµ, ¹èÀ½ ±¸Á¶, À½¿ª µî À½¼ºÀÇ ´Ù¾çÇÑ ¿ä¼Ò¸¦ ¹Ý¿µÇϱ⠾î·Æ´Ù´Â ¹®Á¦°¡ ÀÖ´Ù. ¿©·¯ °¡Áö À½¼º Ư¡°ªÀ» »ç¿ëÇÑ °æ¿ì¿¡´Â ¸Ó½Å·¯´× ±â¹ÝÀÇ ¿¬±¸µéÀÌ ´Ù¼ö¸¦ Â÷ÁöÇϴµ¥, µö·¯´× ±â¹ÝÀÇ ¿¬±¸µé¿¡ ºñÇØ »ó´ëÀûÀ¸·Î °¨Á¤ ÀÎ½Ä Á¤È®µµ°¡ ³·´Ù´Â ´ÜÁ¡ÀÌ ÀÖ´Ù. ÀÌ·¯ÇÑ ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇØ ¸á-½ºÆåÆ®·Î±×·¥(Mel-Spectrogram)°ú MFCC(Mel Frequency Cepstral Coefficient)¸¦ À½¼º Ư¡°ªÀ¸·Î »ç¿ëÇÑ ÇÕ¼º°ö ½Å°æ¸Á(Convolutional Neural Network) ±â¹ÝÀÇ À½¼º °¨Á¤ ÀÎ½Ä ¸ðµ¨À» Á¦¾ÈÇÏ¿´´Ù. Á¦¾ÈÇÏ´Â ¸ðµ¨Àº ÇнÀ ¼Óµµ ¹× Á¤È®µµ Çâ»óÀ» À§ÇØ ÀüÀÌÇнÀ°ú ¾îÅÙ¼Ç(Attention)À» Àû¿ëÇÏ¿´À¸¸ç, 77.65%ÀÇ °¨Á¤ ÀÎ½Ä Á¤È®µµ¸¦ ´Þ¼ºÇÏ¿© ºñ±³ ´ë»óµéº¸´Ù ³ôÀº ¼º´ÉÀ» º¸¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
t Existing speech-based emotion recognition studies can be classified into the case of using a voice feature value and a variety of voice feature values. In the case of using a voice feature value, there is a problem that it is difficult to reflect the complex factors of the voice such as loudness, overtone structure, and range of voices. In the case of using various voice feature values, studies based on machine learning comprise a large number, and there is a disadvantage in that emotion recognition accuracy is relatively lower than that of deep learning-based studies. To resolve this problem, we propose a speech emotion recognition model based on a CNN(Convolutional Neural Network) using Mel-Spectrogram and Mel Frequency Cepstral Coefficient (MFCC) as voice feature values. The proposed model applied transfer learning and attention to improve learning speed and accuracy, and achieved 77.65% emotion recognition accuracy, showing higher performance than the comparison works.
Å°¿öµå(Keyword) À½¼º °¨Á¤ ÀνĠ  ¸á-½ºÆåÆ®·Î±×·¥   MFCC   ÇÕ¼º°ö ½Å°æ¸Á   ÀüÀÌÇнÀ   ¾îÅټǠ  speech emotion recognition   Mel-Spectrogram   CNN   transfer learning   attention  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå